Yuxuan Zhang
4/25/2020
Datasets source: https://data.world/associatedpress/johns-hopkins-coronavirus-case-trackerdatasets
In the recent two months, covis-19 spread in the U.S., and unfortunately, the situation was not controlled and still deteriorated. Compared with some serious cities like New York and Los Angels, some small counties are relatively safe. Since covis-19 is a highly infectious disease, the severity of the epidemic situation is relative to the population and population mobility. Meanwhile, the weather, location, urbanization may also affect the spread of covis-19. It's necessary to analyze the spreading scope and spreading conditions. This dashboard focuses on the relationship between relative data fo covis-19. It enables users to compare the differences between different variables and show them by diverse visualization.
This dashboard contains 3 parts. It can help users to learn about the distribution of covis-19 in 3 different aspects.
# import modules and pakages
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import bqplot
import ipywidgets
import matplotlib.colors as mpl_colors
import seaborn as sns
import json
import plotly.express as px
import plotly.graph_objs as go
from urllib.request import urlopen
import PIL.Image as Image
# import and show file
cases = pd.read_csv('/Users/Sophie9w9/Desktop/1_county_level_confirmed_cases.csv')
# data cleaning
# delete the NaN and 0 records since it will affect the result
new_cases = cases[(cases['total_population'].notnull()) & (cases['total_population'] != 0)]
# select numerial and categorical data respectivily for the further interactive page
numerial = cases[['total_population', 'confirmed', 'confirmed_per_100000', 'deaths', 'deaths_per_100000']]
categorial = cases[['state', 'county_name', 'lat', 'lon', 'NCHS_urbanization']]
# calculate the population, death case, and confirmed case in each state
population = cases["total_population"].groupby([cases["state"], cases["NCHS_urbanization"]]).sum()
total_death = cases["deaths"].groupby([cases["state"], cases["NCHS_urbanization"]]).sum()
total_confirm = cases["confirmed"].groupby([cases["state"], cases["NCHS_urbanization"]]).sum()
# combine the results
comb1 = pd.concat([population, total_death], axis=1)
comb2 = pd.concat([comb1, total_confirm], axis=1).reset_index()
# delete abnormal data
# delete the Northern Mariana Islands since it has 0 record in population
final_record = comb2[comb2['total_population'] != 0]
# process data for the heatmap visualization
# find distinct states and store them into a list
states = new_cases['state'].unique()
states_new = states.tolist()
# find distinct county and store them into a list
urbanizations = new_cases['NCHS_urbanization'].unique()
urbanizations_new = urbanizations.tolist()
# since hist2d con only recgonize numerial variables, so it's necessary to transform the states and conties to distinct numbers
# transform states
def set_index(x):
if x in states_new:
return states_new.index(x)
# transform counties
def set_index2(x):
if x in urbanizations_new:
return urbanizations_new.index(x)
# add the new numbers to the dataframe
final_record.loc[:, 'state_number'] = final_record['state'].apply(set_index)
final_record.loc[:, 'urbanization_number'] = final_record['NCHS_urbanization'].apply(set_index2)
@ipywidgets.interact(x_axis = categorial.columns, y_axis = categorial.columns, colormap = numerial.columns)
def create_plot(x_axis, y_axis, colormap):
fig,ax = plt.subplots(figsize=(10,6))
ax.scatter(new_cases[x_axis], new_cases[y_axis], c=np.log10(new_cases[colormap]))
ax.set_xlabel(x_axis)
ax.set_ylabel(y_axis)
plt.show()
Image.open('visualization 1.png')
The first visualization is a highly-interactive dashboard for users. Users can select any data they want and these data will be reflected on a plot chart. The axes are the categorical variables and the dots are numerical data. Users can learn the relationship between any three columns of data directly and clearly. For example, we can check the death population according to longitude and latitude in this visualization. This visualization can offer users a general idea of the dataset.
nlong = 51
nlat = 5
hist2d, x_edges, y_edges = np.histogram2d(final_record['state_number'],
final_record['urbanization_number'],
weights = final_record['total_population'],
bins=[nlong, nlat])
np.log10(hist2d).min()
np.log10(hist2d).max()
# set the "bad" values
hist2d[hist2d<= 0] = np.nan
hist2d = np.log10(hist2d)
np.nanmin(hist2d)
# create label
mySelectedLabel = ipywidgets.Label() # print out info about selection
x_centers = (x_edges[: -1] + x_edges[1:]) / 2
y_centers = (y_edges[: -1] + y_edges[1:]) / 2
# create heatmap
# scales
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()
col_sc = bqplot.ColorScale(scheme='RdPu', min=np.nanmin(hist2d), max=np.nanmax(hist2d))
# axis
x_ax = bqplot.Axis(scale=x_sc, label='Urbanization')
y_ax = bqplot.Axis(scale=y_sc, label='State', orientation='vertical')
c_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
# marks
heat_map = bqplot.GridHeatMap(color=hist2d,
row=x_centers,
column=y_centers,
scales={'color':col_sc, 'row':y_sc, 'column':x_sc},
interactions={'click':'select'},
anchor_style={'fill':'blue'})
# create scatter plot
# scales
x_scl = bqplot.LinearScale()
y_scl = bqplot.LinearScale()
# axis
ax_xcl = bqplot.Axis(label='Comfirmed', scale=x_scl)
ax_ycl = bqplot.Axis(label='Death', scale=y_scl,
orientation='vertical', side='left')
# marks
i, j = 25, 0
xs = [x_edges[i], x_edges[i+1]]
ys = [y_edges[j], y_edges[j+1]]
# set data in particular bins
region_mask = ( (final_record['state_number'] >= xs[0]) & (final_record['state_number'] <= xs[1]) &\
(final_record['urbanization_number'] >= ys[0]) & (final_record['urbanization_number']<= ys[1]))
scatter = bqplot.Scatter(x= final_record['confirmed'],
y=final_record['deaths'],
scales={'x':x_scl, 'y':y_scl})
# link heatmap with scatter plot
def get_data_value(change):
# to make sure we only support single selections
if len(change['owner'].selected) == 1: # *only* 1 selection
j,i = change['owner'].selected[0]
v = hist2d[j,i]
mySelectedLabel.value = 'Death Population = ' + str(v)
# upadate scatter plot
xs = [x_edges[j], x_edges[j+1]]
ys = [y_edges[i], y_edges[i+1]]
# set data in particular bins
region_mask = ( (final_record['state_number'] >= xs[0]) & (final_record['state_number'] <= xs[1]) &\
(final_record['urbanization_number'] >= ys[0]) & (final_record['urbanization_number']<= ys[1]) )
scatter.x = final_record['confirmed'][region_mask]
scatter.y = final_record['deaths'][region_mask]
# make sure we "observe" for a change in our heatmap (traitlets)
heat_map.observe(get_data_value, 'selected')
# create figure
fig_heatmap = bqplot.Figure(marks=[heat_map], axes=[c_ax, y_ax, x_ax])
fig_plot = bqplot.Figure(marks=[scatter], axes=[ax_xcl, ax_ycl])
# put it all together
fig_heatmap.layout.min_width='400px'
fig_plot.layout.min_width = '400px'
plots = ipywidgets.HBox([fig_heatmap, fig_plot])
myDashboard = ipywidgets.VBox([mySelectedLabel, plots])
myDashboard
The second visualization focuses on urbanization and states. It contains one heatmap and one plot chart. The x-axis of the heatmap is the status of urbanization and the y-axis of the heatmap is the states. So users can check the color blocks each column to compare the severity between cities with different urbanization. They can also check the color blocks each line to compare the severity of covis-19 between different states. The plot chart linked with heatmap can provide users more details. When they click each color clock, the corresponding death population and confirmed population in this state with specific urbanization will be showed on the plot chart. Compared with the first visualization, the second one is a more detailed dashboard so that users can only focus on specific data. These data are linked so users can find the relationship and analyze this dataset more easily.
Image.open('visualization 2.png')
# import the U.S. map
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as f:
us_map = json.load(f)
# show the covis-19 data on the map
fig = px.choropleth_mapbox(
data_frame= cases,
geojson= us_map,
color= 'confirmed',
range_color=(0, 300),
locations= 'fips_code',
mapbox_style= 'carto-darkmatter',
color_continuous_scale= 'viridis',
center= {"lat": 37.110573, "lon": -96.493924},
opacity= 0.8,
zoom= 3,
labels= {'confirmed': 'confirmed cases'},
)
fig.update_layout(title = {'text':'Comfirmed Cases Distribution'})
fig.show()
The last visualization is a map with shows the confirmed cases in different areas. The data of confirmed cases show as color blocks on the map so that can help users learn the situation more clearly and directly. The area trends to bright yellow mean this area has a more serious situation. This map can zoom in and zoom out so that users can check the details better.
Image.open('visualization 3.png')